A Tutorial on Linear Function Approximators for Dynamic Programming and Reinforcement Learning

نویسندگان

  • Alborz Geramifard
  • Thomas J. Walsh
  • Stefanie Tellex
  • Girish Chowdhary
  • Nicholas Roy
  • Jonathan P. How
چکیده

A Markov Decision Process (MDP) is a natural framework for formulating sequential decision-making problems under uncertainty. In recent years, researchers have greatly advanced algorithms for learning and acting in MDPs. This article reviews such algorithms, beginning with well-known dynamic programming methods for solving MDPs such as policy iteration and value iteration, then describes approximate dynamic programming methods such as trajectory based value iteration, and finally moves to reinforcement learning methods such as Q-Learning, SARSA, and least-squares policy iteration. We describe algorithms in a unified framework, giving pseudocode together with memory and iteration complexity analysis for each. Empirical evaluations of these techniques with four representations across four domains, provide insight into how these algorithms perform with various feature sets in terms of running time and performance. A. Geramifard, T. J. Walsh, S. Tellex, G. Chowdhary, N. Roy, and J. P. How. A Tutorial on Linear Function Approximators for Dynamic Programming and Reinforcement Learning. Foundations and Trends R © in Machine Learning, vol. 6, no. 4, pp. 375–451, 2013. DOI: 10.1561/2200000042.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reinforcement Learning Applied to Linear Quadratic Regulation

Recent research on reinforcement learning has focused on algorithms based on the principles of Dynamic Programming (DP). One of the most promising areas of application for these algorithms is the control of dynamical systems, and some impressive results have been achieved. However, there are significant gaps between practice and theory. In particular, there are no con vergence proofs for proble...

متن کامل

Competitive Function Approximation for Reinforcement Learning IRI Technical Report

The application of reinforcement learning to problems with continuous domains requires representing the value function by means of function approximation. We identify two aspects of reinforcement learning that make the function approximation process hard: non-stationarity of the target function and biased sampling. Non-stationarity is the result of the bootstrapping nature of dynamic programmin...

متن کامل

Stable Function Approximation in Dynamic Programming

The success of reinforcement learning in practical problems depends on the ability to combine function approximation with temporal di erence methods such as value iteration. Experiments in this area have produced mixed results; there have been both notable successes and notable disappointments. Theory has been scarce, mostly due to the difculty of reasoning about function approximators that gen...

متن کامل

High-accuracy value-function approximation with neural networks applied to the acrobot

Several reinforcement-learning techniques have already been applied to the Acrobot control problem, using linear function approximators to estimate the value function. In this paper, we present experimental results obtained by using a feedforward neural network instead. The learning algorithm used was model-based continuous TD(λ). It generated an efficient controller, producing a high-accuracy ...

متن کامل

Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding 1 Reinforcement Learning and Function Approximation 2 Good Convergence on Control Problems

On large problems, reinforcement learning systems must use parame-terized function approximators such as neural networks in order to generalize between similar situations and actions. In these cases there are no strong theoretical results on the accuracy of convergence, and computational results have been mixed. In particular, Boyan and Moore reported at last year's meeting a series of negative...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Foundations and Trends in Machine Learning

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2013